Improving Rare Case Prediction with Replication Technique
نویسندگان
چکیده
The ability to predict correctly rarely occurring cases is important to the success of applying data mining method to many real life applications. In the context of data mining, rare cases refer to labeled data instances that are infrequently occurred in the database. Discovering infrequent patterns are of interest in some specific domains such as genetic mutant identification, fraud credit card detection, network intruder prevention. But most learning algorithms are biased toward the majority cases such that the minority cases are considered as noise and thus they are ignored during the model induction steps. This ignorance causes the learning algorithm to generate a model that cannot classify or predict a minority case. We thus study the replication technique based on the over-sampling method to solve this problem. However, a straightforward application of oversampling method may lead to the over-fitting problem in such a way that the generated model is too specific to the manipulated data. We thus apply the cluster-based technique to selectively filter a training dataset. The experimental results on primary tumor, arrhythmia and communities-and-crime datasets show significant improvement on predicting accuracy, specificity, and sensitivity of the induced models. But the results on multiple features correlation dataset show non-significant improvement; this case requires further investigation.
منابع مشابه
Improving Data Grids Performance by Using Modified Dynamic Hierarchical Replication Strategy
Abstract: A Data Grid connects a collection of geographically distributed computational and storage resources that enables users to share data and other resources. Data replication, a technique much discussed by Data Grid researchers in recent years creates multiple copies of file and places them in various locations to shorten file access times. In this paper, a dynamic data replication strate...
متن کاملImprove Replica Placement in Content Distribution Networks with Hybrid Technique
The increased using of the Internet and its accelerated growth leads to reduced network bandwidth and the capacity of servers; therefore, the quality of Internet services is unacceptable for users while the efficient and effective delivery of content on the web has an important role to play in improving performance. Content distribution networks were introduced to address this issue. Replicatin...
متن کاملImproving Data Replication in Mobile Grids using Mobility Prediction
Data replication is a technique used in mobile grid environments to enhance system reliability by increasing data availability and reducing access latency and network utilization. Due to the dynamic nature of mobile grids, replica placement becomes one of the most important challenges. It has a great impact on the performance of the whole system. Efficient placement strategies should consider b...
متن کاملA Survey of Dynamic Replication Strategies for Improving Response Time in Data Grid Environment
Large-scale data management is a critical problem in a distributed system such as cloud,P2P system, World Wide Web (WWW), and Data Grid. One of the effective solutions is data replicationtechnique, which efficiently reduces the cost of communication and improves the data reliability andresponse time. Various replication methods can be proposed depending on when, where, and howreplicas are gener...
متن کاملPrediction of shear and Compressional Wave Velocities from petrophysical data utilizing genetic algorithms technique: A case study in Hendijan and Abuzar fields located in Persian Gulf
Shear and Compressional Wave Velocities along with other Petrophysical Logs, are considered as upmost important data for Hydrocarbon reservoirs characterization. Shear Wave Velocity (Vs) in Well Logging is commonly measured by some sort of Dipole Logging Tools, which are able to acquire Shear Waves as well as Compressional Waves such as Sonic Scanner, DSI (Dipole Shear Sonic imager) by Schlumbe...
متن کامل